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Abstract: Kiefer and Wolfowitz [Z. Wahrsch. Verw. Gebiete 34 (1976) 73- 
85] showed that if F is a strictly curved concave distribution function (cor- 
responding to a strictly monotone density /), then the Maximum Likelihood 
Estimator F n , which is, in fact, the least concave majorant of the empirical 
distribution function F„, differs from the empirical distribution function in 
the uniform norm by no more than a constant times (n~ 1 log n) 2 / 3 almost 
surely. We review their result and give an updated version of their proof. We 
prove a comparable theorem for the class of distribution functions F with con- 
vex decreasing densities /, but with the maximum likelihood estimator F n 
of F replaced by the least squares estimator F n : if Xi , . . . , X n are sampled 
from a distribution function F with strictly convex density /, then the least 
squares estimator F n of F and the empirical distribution function F n differ 
in the uniform norm by no more than a constant times (i _1 logn) 3 / 5 almost 
surely. The proofs rely on bounds on the interpolation error for complete spline 
interpolation due to Hall [J. Approximation Theory 1 (1968) 209-218], Hall 
and Meyer [J. Approximation Theory 16 (1976) 105-122], building on earlier 
work by Birkhoff and de Boor [J. Math. Mech. 13 (1964) 827-835]. These re- 
sults, which are crucial for the developments here, are all nicely summarized 
and exposited in de Boor [A Practical Guide to Splines (2001) Springer, New 
York]. 



1. Introduction: The Monotone Case 

Suppose that X\,. .. ,X n are i.i.d. with monotone decreasing density / on (0,oo). 
Then the maximum likelihood estimator /„ of / is the well-known Grenander es- 
timator: i.e. the left-derivative of the least concave majorant F n of the empirical 
distribution function F„. 

In the context of estimating a decreasing density / so that the corresponding 
distribution function F is concave, Marshall [l7( showed that F n satisfies H-Fn — F\\ < 
||F„ - F|| so that we automatically have */n\\F n - F|| < V™II F « - F \\ = O p (l). 
Kiefer and Wolfowitz [3| sharpened this by proving the following theorem under 
strict monotonicity of / (and consequent strict concavity of F). Let cti(F) = inf{f : 
F(t) = 1}, and write || 5 || = sup < t < ai(f) \g(t)\. 

Theorem 1.1 (Kiefer-Wolfowitz [3). If a x {F) < oo, 

/MF)= inf (-/'(t)// 2 (t)) > 0, 

0<t<ai(F ) 
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7l(F) = sup 0<t<Ql(F) (-/'(t))/inf 0<t<Ql(F ) f 2 (t)) < oo, and f is continuous on 
[0,ai(F)\, then 

(1) \\F n - ¥ n \\ = Odn- 1 logn) 2/3 ) almost surely. 

Although Kiefer and Wolfowitz did not formulate their result in this way, the 
statement above follows from their proof. Also note that ^ implies that 

M\F n - ¥ n \\ = Oin-^ilogn) 2 ^ 3 ) -> 

almost surely, so that the MLE F n and the empirical distribution are asymptotically 
equivalent under the hypotheses of Theorem 1. 

Kiefer and Wolfowitz [l4| used Theorem O to show that the MLE F n of F 
in the class of concave distributions is an asymptotically minimax estimator of F. 
(Also see Kiefer and Wolfowitz for a generalization of the results of Kiefer and 
Wolfowitz [14!] to allow somewhat weaker conditions.) 

It follows from the rather general theorem of Millar [l8| that the empirical dis- 
tribution function F„ remains asymptotically minimax in a wide range of problems 
involving shape- constrained families of d.f.'s T . In particular, for the classes Tk of 
distribution functions corresponding to fc-monotone densities, it follows from Millar 
fli| that the empirical distribution function F ra is asymptotically minimax for esti- 
mation of F even in the smaller classes Fk- The interesting question which has not 
been addressed concerns asymptotic minimaxity of the MLEs within these classes. 
Our goal in this paper is to make some headway toward answering these questions 
by giving a partial (and imperfect) analogue of Theorem 1 1.1 1 in the case of T<l, the 
class of distribution functions corresponding to the class of decreasing and convex 
densities. The MLE and least squares estimators of a density / corresponding to 
F G F% have been studied by Groeneboom, Jongbloed and Wellner [ll|, and those 
results will provide an important starting point here. 

In fact, we will not study the MLE, but its natural surrogate, the least squares 
estimator. This is because of the lack of a complete analogue of Marshall's lemma 
for the MLE in the convex case, while we do have such analogues for the least 
squares estimator; see Diimbgen, Rufibach and Wellner Q and Balabdaoui and 
Rufibach Q|. 

One view of the Kiefer- Wolfowitz Theorem 1 1.1 1 is that it is driven by the (family 
of) corresponding local results, as follows: 

Theorem 1.2 (Local process convergence, monotone case). Suppose that to S 
(0,oo) is fixed with f(to) > and /'(to) < 0, and f and f continuous in a neigh- 
borhood of to . Then 

n 2/3 (F n (t + n-'/H) - ¥ n (t + n^H)) 

(2) C 6jC (t) - Y x (t) i (qJM) {C(ot) - (W(at) - aH 2 )} 

in (D[—K,K], || • ||) for every K > where 

Yx{t) = ^J{tojW(t) + (1/2) f'(t )t 2 = bW(t) - ct 2 

for W a standard two-sided Brownian motion process starting from 0, Cb )C is the 
Least Concave Majorant 0/Y1, C = Ci.i is the least concave majorant ofW(t) — t 2 , 

anda^([f(t )] 2 /(if(t ))) 1/3 . 
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The (one-dimensional) special case of ((2J with t = is due to Wang [2y|, while 
the complete result is given by Kulikov and Lopuhaa . 

Here the logarithmic term on the right side of {1} reflects the cost of transferring 
the family of (in distribution) local result to an (almost sure) global result. Here 
is a heuristic proof of for the complete proof, see Kulikov and Lopuhaa [161 ]. 
For a similar result in the context of monotone regression, see Durot and Tocquet 
Q, and for a similar theorem in the context of the Wicksell problem studied by 
Groeneboom and Jongbloed , see Wang and Woodroofe [25[ . For a related result 
in the context of estimation of an increasing failure rate, see Wang 

Proof of Theorem \1.SX We rewrite the left side of (J2J) as 

n 2 / 3 {F n (t + n-^H) - F„(to + n- 1/3 i)} 
(3) = n 2/3 {F„(i + n-^ 3 t) - F(t ) - rT^f^t} 

- n 2/3 {F„(< + n-^t)) - F n (to) - n' 1 ' 3 /(to)*} 
+ n 2 / 3 {F„(i ) - F n (t ) - (F n (r -) - F n (r -))} 
-n 2 / 3 {F n (i )-F(i )} 

where t ~ is the first point of touch of F n and F„ to the left of to- From known local 
theory for F n and F„ it follows easily that 

n 2/3 {¥ n (t + n-VH)) - F„(t ) - n~ 1/3 /(to)t} 



(4) => Vf(tojW(t) + -f(to)t 2 =Y 1 (t), 

(5) n 2 / 3 {F n (t + n- l 'H) - F(t ) - n^ 3 f(t )t} => C b , c (t) 
and 

(6) n 2 ^ 3 {F n {t )~F{t )}^C b AO) 

where Cb, c is the least concave majorant of Y\. It remains to handle the third term. 
But since F n (t ) - F n (TQ} = / n (to)(*o - T o) by linearity of F n on (t ~,t + ), 

ri 2 / 3 {F„(t ) - F„(t ) - (F n (T ~) - F„(r -)} 

= -n 2 / 3 (F„(t ) - F n (T ") - /„(t )(*o - 7b")) 
= -n 2 / 3 (F„(i ) - F„(t -) - /(to)(*o - 7-q-)) 

+ n 2 / 3 (7„(<o)-/(to))(to-r -) 
= n 2 / 3 {F„(t + n^V/ 3 ^ - t )) - F„(t ) 

-/(tdJn-^V/^To"-^)} 

-n 1/3 (^(to)-/(to))n 1/3 (r --t ) 
-> d Yi(r_) - Cg(0)r_ = Yi(t_) - {C fc , c (0) + cg(0)r_} + C 6 , c (0) 

(7) = Y x (r_) - C 6>c (r_) + C 6 , c (0) = C 6 , c (0) 

where r_ is the first point of touch of Yi and Cb, c to the left of 0, and hence 
Cb, c (r_) = Y x (r_). Combining gj), ©, © and © with © it follows that 

n 2/3 {F n (t + n- x ' 3 t) - F n (t„ + n^H)} C 6 , c (t) - Y x (t) 
in (£>[-#, if], || • ||) for each fixed K > 0. □ 
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2. The convex case 



Now suppose that Xx, . . . , X n are i.i.d. with monotone decreasing and convex den- 
sity / on (0,oo). Then the maximum likelihood estimator /„ of / is a piecewise 
linear, continuous and convex function with at most one change of slope between 
the order statistics of the data, and, as shown by Groeneboom, Jongbloed and 
Wellner ll| , is characterized by 



H71 fn ) 



< l,x > 

= l,i£f> n (x-)<? n (x+). 



where, with K. being the class of convex and decreasing and nonnegative functions 
on [0, 00), 



Hn(xJ) 



2(x - y)/x 2 



[0,x] 



d¥ n (y), (x,/)eR+x/C. 



As shown by Groeneboom, Jongbloed and Wellner , the least squares estimator 
/„ of / is also a piecewise linear, continuous, and convex function with at most one 
change of slope between the order statistics, but is characterized by 



W n (x) 



>Y n (x), x>o, 

= ¥„(*), if < f n {x+). 



where W n (x) = f* $ f n {u)dudy = f* F(y)dy and Y n {x) = J*F n (y)dy. The 
corresponding estimators F n of F and Y are given by F n (x) — f Q x f n (y)dy and 

H n (x) = J? F n (y)dy respectively. Since pointwise limit theory for both the MLE 
and the least squares estimators of / are available from Groeneboom, Jongbloed 
and Wellner [ll[, we begin by formulating a (family of) local convergence theorems 
analogous to Theorem 11.21 in the monotone case. These will serve as a guide in 
formulating appropriate hypotheses in the context of our global theorem. 

Theorem 2.1 (Local process convergence, convex case). If /(to) > 0, 

f"(to) > 0, and f(t) and f"(t) are continuous in a neighborhood of to, then for 

(F n ,H n ) = (F n ,W n ) or for (F n ,H n ) = (F„,H n ), 



(8) 



in [D[-K,K], 



n 3 / 5 (F n (t + n-VH) - ¥ n (t + n^H)) 
n 4 / 5 (H„(i + n^'H) ~ Y n (t + n^H)) 

H a (tj-Y a (i) 



/"(to) 



24 3 



f(to) 4 



4 \ V5 



VV f"(to) 3 

) for every K > where 



(H 2) »(at) - Y a ,.(at)) 



Y 2 (t) = VfW) / W(s)ds + -rf"(t )t 



24' 
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and H2 is the "invelope" process corresponding to Y2.' i.e. H2 satisfies: (a) Ha^) > 
Y a (t) for all t; (b) J_!° 00 (H 2 - Y 2 )dH 2 3) = 0; and (c) H^ 2) is convex, ffere 

a Wm) ' 

andW.2^1 ^2,s denote the "standard" versions 0/H2 and Y2 wii/i coefficients X: i.e. 
Y 2 , s (t)= f*W(s)ds + t 4 . 

Note that /32(F) = inf 0<t<Ql ( J r )(f"(t)/f 3 (t)) is invariant under scale changes of 

F, while (52(F) = sup 0<t<Q , 1 ( i r)(/"(i) 2 //W) 1/ ' 5 is equivariant under scale changes 
of F; i.e. <5 2 (F(c-)) =cS 2 (F). 

Proof. Here is a sketch of the proof of the convergence in the first coordinate of 
©. We write 

n 3/b {F n {t a + n^'h) - ¥ n (t Q + n-^H)) 

= n 3 / 5 (F n (t a + n^'h) ~ F(t ) ~ rr 1/5 ~/(*o)* 3 ) 

6 

- n 3 / 5 (F„(i + n-^h) - F„(t ) - rT^i/ft)* 3 ) 

6 

+ n 3 / 5 (F„(t ) - F„(t ) - (F„(t -) - F„(r -)) 
-n 3 / 5 (F„(i )-F(i )). 

Here 

n 3/5 (F n (t + n-VH) - F(t ) - n^^f^t 3 ^ H«(i), 

n 3/5 (V„(t + n- 1 / 8 *) - F„(t ) - n" 1 / 5 ^)^ f^t), 
n 3/5 (F„(to)-F(i ))^H( 1) (0), 

while 

n 3/5 (F„(t ) - F n (t ) - (F n (r -) - F n (r -)) 

= n 3 / 5 (¥ n (t + n- 1 ^ 5 ^ - t Q )) - ¥ n (t ) 

n- 1/5 lf(to)(n 1/5 (To-to)r 
- n 3 / 5 (F tt (t + n-^n 1 ^^- - t )) - F(i ) 

+ n 3 / 5 (F n (t )-F(t Q )) 
->d Y 2 1} (r-) - 4 X) (r_) + H 2 1} (0) = H 2 1} (0) 

since Y 2 1} (t_) = H^^t-). Combining the pieces yields the claim. 

The proof for the second coordinate is similar. □ 

Now we can formulate our main result. Fix r < cei(F). Our hypotheses are as 
follows: 
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Rl. F has continuous third derivative F( 3 )(t) = f"(t) > for t £ [0, r] and 

/3 2 (F,r) = info< t < T (/"(t)// 3 (t))>0 
R2. 7!(F,r) = sup 0<t<T (-/'(t)// 2 W) < oo. 
R3. j 2 (F,t) = sup 0<t<T /"(i)/info< t<r f 3 (t) < oo. 

R4. fl = max{l,sup Q<t<T /(t)}/inf <i< T /(*) = max{l, /(0)}//(r) < oo. 

In the rest of the paper we fix r £ (0, ai(F)) such that R1-R4 hold, and let 
\\h\\ = sup 0<4<r the supremum norm of the real-valued function h on [0, r]. 

Theorem 2.2. Suppose that R1-R4 hold. Then 

(9) \\F n - F n || ee sup \F n (t) - F n (t)| - O^ 1 logn) 3 / 5 ), 

0<t<T 

(10) Ufift - Y„|| ee sup |H„(t) - Y n (t)\ = Odn- 1 logn) 4 / 5 ), 

0<t<T 

almost surely. 

Note that © and dTOj) imply that 

(11) n 1 / 2 \\F n -¥ n \\=0{n- 1 / w {\ognf/ b ) 1 

(12) n 1/2 ||H„ - Y„|| = O(n- 3/10 (logn) 4 / 5 ), 

almost surely. 

To prepare for the proof of Theorem 12. 21 fix < r < ai(F) for which the 
hypotheses of Theorem 12.21 hold. For an integer k > 2 define dj = dj ee -F _1 ((j/ 
k)F(r)) for j = 1, . . . , k, and set ee a ee cxq(F) ee sup{x : F(a;) = 0}. Note 
that = F^ 1 (F(t)) = r for all k > 2. We will often simply write dj for o^- , but 
the dependence of the knots {a?} on k (and the choice of k depending on n) will be 
crucial for our proofs. We also set Aja — aj — dj—i, and write \a\ — maxi<j<fc Aja. 

Let M n ^k be the complete cubic spline interpolant of Y„ with knot points given by 
{dj, j = 0, . . . , k}. Thus H Wi fc is piecewise cubic on [o^_i,Oj], j = l,...,k with two 

continuous derivatives H„ \ and H^ 2 ),; see de Boor pages 39-43 and 51-56. We 

will choose k = k n ~ (Cn/ log n) 1 / 5 — » oo in our arguments. H?l is not necessarily 
convex, but we will show that it becomes convex on [0, r] with high probability as 
n — > oo, and hence H n! fc n will play a role analogous to the role played by the linear 
interpolation of F„ in the proofs of Kiefer and Wolfowitz [3] • (We will frequently 
suppress the dependence of k — k n on n, and write simply k for k n .) 

Let Y be defined by Y(t) = f*F(s)ds; thus Y (1 ^ = F, = / ( ^ 2) , for 

j £ {2, 3, 4}. We will also need the complete cubic spline interpolant Hk n of Y; this 
will play the role of the linear interpolant L — of F in Kiefer and Wolfowitz 

0- 

The cubic spline interpolant M n ^ of Y„ based on the knot points {aj- \ j = 
0, . . . , k} is completely determined on [0,r] by the values of Y n at the knots dj, 
j = 1, . . . , k together with the values of Y„ = F n at and ak — t, namely Y n (dj), 
j = 1, . . . , J, Y n 1} (0) = F„(0) = 0, and Y n 1} (r); see, e.g., de Boor @, page 43. As de 
Boor nicely explains in his Chapter IV, the complete cubic spline interpolant is one 
case of a family of cubic interpolation methods. Taking de Boor's function g to be 
our present function Y n , several different piecewise cubic interpolants of Y„ can be 
described in terms of cubic polynomials Pj on each of the intervals [dj, Oj+i] where 
the interpolating function H n (-; s) is given by W n (x;s) = Pj(x;s) for x £ [dj, Oj*+i], 
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j = 0, . . . , k — 1, and where we require 

P 3 ( a 3 ) = Y » ( a j ) > P J i a 3+l ) = Y " ( a J + l ) 

P 'j{ a j) = s j, P j( a j+l) = Sj+1, 

for j — 0, . . . , fc— 1. Here s = (sq, . . . , sjt) and the s/s are free parameters. Different 
choices of the Sj 's leads to different piecewise cubic functions agreeing with Y„ at the 
knots a,; all of these different approximating functions H n (-; s) are continuous and 
have continuous first derivatives. Of interest to us here are the following particular 
ways of determining the Sj's: 

• Sj = Yn\aj) = W n (a,j), j = 0, . . . , k. This gives the piecewise cubic Hermite 
interpolant of Y n , H„(-, s) = 

• Sj, j — 0,...,k chosen so that H n (-,s) S C 2 [0,t]; i.e. so that Wn\-,s) is 

continuous and sq — (0) = and Sk = Y„ (o>k) = Y n ( T )- This gives the 
complete cubic spline interpolant of Y„, H„(-, s) 

The complete spline interpolant M n cs w iU pl a Y the r °l e for us that the linear 
interpolant L n of F„ played in Kiefer and Wolfowitz [l4j]. As we will see, however, 
even though the Hermite interpolant H n! # erm is not in C 2 [0,r] (i.e. m^ Herm is 
not continuous), the slopes of its piecewise linear second derivative can be given 
explicitly in terms of Y„ and Y^ = F„ at the knots, and our proof will proceed by 

(2) 

relating the slopes of Herrn to the (more complicated and less explicit) slopes of 

(2) (2) 

cs = k in order to prove point B in the following outline of our proof. 
Here is an outline of the proof, paralleling the proof of the K-W theorem. 

Main steps, proof of ([9]) distribution function equivalence: 

A. By the generalization of Marshall's lemma for the convex density problem 
(see Dumbgen, Rufibach and Wellner Q), for any function h with convex 
derivative h', \\M.^P — h\\ < 2||F„ — h\\ where = f n . [This generalization 
is not yet available for the MLE m^ of F in Ti corresponding to = /„; 
see Dumbgen, Rufibach and Wellner 0] for a one-sided result.] 

B. P F (A n ) = P F {M { ^ kn is convex on [0, t}} / 1 as n -► oo if k n = (C /3 2 (F) 2 n/ 
logn) 1 / 5 for some absolute constant Co- 

C. On the event A n , 

pm Fn || = ||fiw - m[X + mlX F «H 

<2||F„-H«J| + ||H« n -F n || 

by the generalization of Marshall's lemma (A) 
= 3||F n -<ij 

= 3||F„-H« n -(F- J ffW)+^- J ffW|| 

< 3||F„ - m ( n X — (F — Hg)\\ +3\\F- Hj»\\ 
= 3D n + 3E n . 

D. We show that D n = 0((n _1 logn) 3 / 5 ) almost surely via a generalization of 
the K-W Lemma 2. We also show that E n = 0((n _1 logn) 3//5 )by an analytic 
(deterministic) argument. 



s 



Balabdaoui and Wellner 



Of course proving step B in this outline involves showing that the slopes of 

(2) 

the M. n k become ordered with high probability for large n, and this explains our 

interest in the slopes of both 'R^cs = an< ^ ^nHerm- 

The assertion (TTU)) of Theorem 12.21 can be proved in a similar way if we replace 

H« \ H^, Hg, F„ , F by H n , H n>fen , H kn , Y„, Y respectively, and if we replace 
A by the following recent result of Balabdaoui and Rufibach [l[: 

A'. For any function G with convex second derivative g" , ||H„ — G\\ < \\Y n — G\\. 

Proof of 0) assuming B. First the deterministic term E n . As in de Boor Q , page 
43, let I 4 denote the complete cubic spline interpolation operator, and (as in de Boor 
page 31, let I2 be the piecewise linear (or "broken line") interpolation operator. 
Then by de Boor [5|, (20) on page 56, with p n = l/k n , 



E n = \\F-H. 



\\ = \\YW-{hY)W\\<ha\ 3 \\YW\\ 



24' 

<l 72 (F,r)^ =0 (( n -ilogn) 3 / 5 ). 

To handle D n , let $3 be defined to be the space of all quadratic splines on [0,r], 
and similarly let $2 be the space of all linear splines on [0, r]. Then, by de Boor [5j, 
page 56, equation (17), together with (18) on page 36, it follows that with 

D n = ||F„ - MIX -i p - H £)W = II ( Y « - Y ) {1) (A(Y« - ^)) (1) ll 
< ^dist((Y„-y)W;$ 3 ) < ^dist((Y„-y)( 1 );$ 2 ) 

<^||(Y„-y)W-/ 2 [(Y„-y)( 1 )]|| 

= ^||(F n -F)-7 2 (F„-F)|| 

<™u(F n -F;\a\) 
A ^ n -V2 w(Un . pn) 

= OirT 1 ! 2 ' \/ 'p n log(l/p„)) almost surely 
= 0((n -1 logra) 3/s ); 



here 

u(g; h) = sup{| 5 (<) - g(s)\ : \t - s\ < h}, 

<SM(g;S) = min{\\g - f\\ : / € S},S C C[0,t] and 

U n (t) = >/n(6 n (i) - *) 

where G n (i) = n -1 X)"=i l[o,t] i s the empirical distribution function of £1, . . . , £ n 
i.i.d. Uniform(0, 1) random variables. (See de Boor pages xviii, 24, 32, and 34 for 
definition and use of dist(g; S) and the modulus of continuity lo in conjunction.) □ 

Proof of U0\) assuming B. By Hall 12) (also see Hall and Meyer [l3| for optimality 
of the constant and de Boor [5J], page 55), 

£ „ sr - ffk jl<A m | FW ||<_L fi72(F) i. 



A Kiefer-Wolfowitz theorem 



9 



To handle the first term D n , we note that 

Y n -Y- (H„, fe „ - H k J = (Y n -Y)- I 4 (Y n - Y) 

where I4 is the complete spline interpolant, and, on the other hand, for any dif- 
ferentiable function g it follows from de Boor page 45, equation (14), together 
with (18) on page 36, that 

19 19 

11.9 -hg\\ < -Mdist( 5 ',$ 3 ) < -Mdist( 5 ',$ 2 ) 

< yHI|9'-^II< y|a|w( 5 ',|a|). 
Applying this to g = Y n — Y, it follows that 



|Y„ - Y - {M nikn - H kn )\\ = \\(Y n -Y)- h(Y n - Y)\\ 

19 

< — \a\ uj(¥ n -F,\a\) 
= n-^ 2 to(V niPn ) 



Therefore w(F„ — F; |a|) = 0{n~ 1 / 2 \J"p n l°g(l/Pn)) almost surely (just as in the 
proof of Lemma 2 for the Kiefer-Wolfowitz theorem, see Section [5|), we see that the 
order of D n is 

n- 1/2 pl /2 (log(l/p n )) 1/2 = Odn- 1 logn) 4/5 ) almost surely 

as claimed. Thus the claim ([TO]) is proved if we can verify that B holds. □ 
We end this section with a short list of further problems: 

• It would be of interest to prove a comparable theorem for the MLE F n itself 
rather than F n . This involves several additional challenges, among which is a 
complete analogue of Marshall's lemma. 

• Are either F n or F n asymptotically minimax for estimating F € T-fl 

• We conjecture that similar results hold for k— monotone densities and corre- 
sponding distribution functions (k = 1 corresponds to the Kiefer and Wol- 
fowitz monotone density case, while k = 2 corresponds to the convex density 
case treated here). More concretely, we conjecture that under comparable 
hypotheses 

\\F n - ¥ n \\ = Odn- 1 logn) (fc+1)/(2fe+1) ) almost surely 

for F n = F n or F n — F n , the least squares estimator or MLE of F 6 T k . 
Some progress on the local theory of the corresponding density estimators is 
given in Balabdaoui and Wellner [2( and Balabdaoui and Wellner [3J] . On the 
interpolation theory side, the results of Dubeau and Savoie @ may be useful. 

• What is the exact order (in probability or expectation) of \\F n — F„|| in the 
case k = 2? Is it (n _1 logn) 3 / 5 as perhaps suggested by the results of Durot 
and Tocquet [H in the case k = 11 

(2) 

3. Asymptotic convexity of k 



In this section we write C for the complete spline interpolation operator that maps 
functions g £ C^O, r] into their complete spline interpolants C[g] (based on the 
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fixed knot sequence = ao < a\ . . . < a k = t); thus in this section our C is de 
Boor's operator I4. Thus we have 



ht,k 



It follows from the formula for c± t i in (5) on page 40 of de Boor [5[ that the slope 

(2) 

of k on the interval [Oj-i, a,-] is given by 



12 /^ <l(%-i)+<l(^) 
(A j0 )a 



B, ee B,(C5) = - " 9 ~ A,a - AjY„ 



where Aj f = f(aj) — /(a^-i) for j = 1, . . . , k and any function / on [0, r]. 

In the following we will let 7i denote the Hermite interpolation operator that 
maps Y„ to H n : thus U nMerm = H[Y n ], M { ^ Herm = (W[Y n ])«, and so forth. It 
is important to note that the corresponding slopes of the second derivative of the 
Hermite interpolant, H^ 2 #- erTn = CW[Y„])( 2 ' on [aj-i,aj] are given by the same 
formula as in the last display, but with M^ k (ai) replaced by Y„\ai) = ¥ n (ai), 
i=j- 

(13) B 3 = Bj(Herm) = ^ 3 ^ ^ J -A ja - AjY n 

Note that Bj is expressed explicitly as a function of the data via F n and Y n , 
whereas Bj still involves Hn * = C[Y n ] and hence also the interpolation operator 
C. Ordering of the slopes Bj can be shown using only Lemma 13. II and Lemma 14.51 

but (unfortunately) the generalization of Marshall's lemma does not apply to the 

(2) 

Hermite interpolant because the second derivative W n Herm is not continuous at the 



knots. This last formula JT3J) agrees with the formulas for H and H n in Groene- 
boom, Jongbloed and Wellner [10( and Groeneboom, Jongbloed and Wellner [111 ]; 
in particular (j 1 3|) can be viewed as a finite sample analogue of the 3rd derivative 
of the interpolant H given in Groeneboom, Jongbloed and Wellner page 1631, 
but based on the fixed knots {aj} rather than random knots determined by the 

— — (2) 

optimization procedure. Note that the least squares estimator /„ = H„ can be 
viewed as the second derivative of either the Hermite interpolant or the complete 
cubic spline interpolant of Y n since these two interpolants have been forced equal 
by the optimization procedure which determines the knots as random functions of 
the data. 
Set 

k-1 

A n = is convex on [0,t]} = f] {Bj < B i+1 } . 

3=1 

To prove B, we want to bound 



k-1 

P(A c n )<^2P(Bj>B j+1 ). 
3=1 
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To prepare for this, we define 

rp _ (C[Y„])W( fl3 _ 1 ) + (C[Y„])W(g J ) 

1 n,j — 2 3 

_ Yl 1) (a J - 1 )+Yl 1) (fl j ) 

"nj — ^ 3 ~ ^j'ni 

(c[y])( 1 )(a J _ 1 ) + (c[y])( 1 )( fl3 ) A 
_ y(i>( flj -.0 + y(i)( Oj ) 

I n,3 — 2 J J ' 

We will frequently suppress the dependence of all of these quantities on n, and 
simply write Tj for T n j, Rj for i?„j, and so forth. Now Bj = 127}/(Aja) 3 , Bj = 
12Rj/(Ajd) 3 , and we can write 

(14) Tj - rj =Tj -tj +tj -rj 

= />'.. - rj + {Tj - /, - (Rj - rj)} + tj - r 3 

(15) ee Rj - rj + Wj + bj . 

We regard Rj—rj as the main random term to be controlled, and view Tj — tj — (Rj — 
rj) = Wj and tj — rj = bj as second order terms, the last of which is deterministic. 
Thus our strategy will be to first develop an appropriate exponential bound for 
\Rj — rj |, and then by further separate bounds for Wj and bj, derive an exponential 
bound for \Tj — rj\. 

For < s < t < oo, define the family of functions h Sjt by 

h Blt (x) = (x-(s + t)/2)l M (x). 

Note that 

i r* 

PKt = ^(F{t) + F{s))(t -s)-J F(u)du, 

1 I* 
Pnh s , t = ~(¥ n (t) + ¥ n (s))(t -s)-j ¥ n (u)du, 

and, furthermore, 

rj = Ph a ,j_ 1 ,a.j i Rj = ^ > nhaj_ ll aj • 

Here is a (partial) analogue of Kiefer and Wolfowitz's Lemma 1. 

Lemma 3.1. Suppose that 71(F) < 00 and R < 00. Let h s ,t(x) = (x — (s + 
t)/2)l( s t ](x), s — a,j k 2i = a j-i, an d t = dj k ^ = dj so that t — s = dj — Oj-i = 
k (l/f(a*j)) for some a* E [a,j-i,a,j]. Then if 5 n — > and k > 5 x fi(F)R, 

PrQRj -rj\> S nP 3 n ) = Pr(\P n - P\(h s , t ) > 5 n p 3 n ) 

( Zn8lp(a*)pl \ 
P ^ 1+ Pn 6 n f(a*)) 
<2cxp(-3nS 2 n p 3 n f 2 (a*)(l + o(l))) 



where o(l) depends on f(a*), k n , and S n . 
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Proof. First note that \h s , t \ is bounded by (t — s)/2. Thus by Bernstein's inequality 
(see e.g. van der Vaart and Wellner [23|, page 102), 



Pr{\F n h s>t - Ph., t \ > x) < 2 exp - 



V2_\ 



-Mx/3 J 

for a 2 > Var F (h s , t (X)), M = (t — s)/2 = \/{2f{a*)k) = [l/(2/(a*))K, and 



T ' I I I i i \ I t i \ \ ii i ; : i — >\ I ■■■ ' I / I ' i I ( , ■ , , ; ; | ^ j y , , ■ 

x > 0. Note that 



Var{h s , t (X)) < Eh 2 s>t {X) = f (x - (t + s)/2) 2 dF{x) 

J S 

<f(s)(t-s) 3 /12 

= f(s)k- 3 /(12f(a*)) = f{ S )p 3 J{\2f 3 {a* 
<p 3 /(6f 2 (a*)) 
for k > 5ji(F)R by Lemma |4~T1 Then we obtain 
Pr(\F n h., t - Ph., t \ >5 n pl) 

nS 2 n pi/2 \ 



< 2 exp 
= 2 exp 
= 2 exp 



P 3 J(6f(am+ Pn S nP l/(6f(a*)) 



nS 2 n f(a*)p 3 



3 

n. . ,., 



1/3 +p n f(a*M3 



3n5 2 J 2 (a*)p 3 



3> 
J 3 



1 +Pn$nf{a*) 



= 2exp(- 3nSlplf(a*)(l + o(l))) 

where the o(l) term depends on f(t) = /(aj+i), p n = l//c n , and S n . □ 

Remark. Note that taking 5 n = C/k n in Lemma 13.11 yields 

Pr(\F n h Sit - Ph s , t \ > Cpi) < 2exp(~3(nC 2 f 2 (a*)/k 5 n )(l + o(l)j) 

which seems quite analogous to Lemma 4 of Kicfcr and Wolfowitz (1976), but with 
the power of 3 replaced by 5. 

The following lemma gives a more complete version of Lemma 13.11 in that it 
provides an exponential bound for \Tj — rj\. 

Lemma 3.2. Suppose that the hypotheses of Theorem \2.2\ hold: 02(F,t) < oo, 
72 (J 1 , t) < oo, Ji(F) < oo and R < oo. Then if S n = Cp n for some constant C and 
k > {5i?V3} 7 i( J F 1 ), 

/ (100)- 1 n,52/2( a *) ? ,3\ 

\ 1+30 L PnOnf{a*) J 

Proof. This follows from a combination of Lemma l3.ll Lemma l4.2l and Lemma 
Lemma 14.21 yields 

\b j \ = \t j -r j \<&o(l)pi<8 n & 

if n (and hence k n ) is sufficiently large. This implies that 

PrQTj - rj \ > 3S nP 3 n ) < Pr{\T 3 -t j \> 36 n p 3 n - \t 3 - r 3 \) 
<Pr(\T j -t j \>2S n p 3 n ). 
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In view of the decomposition (p~5|) . this yields 

PrQTi -rj\> 3S n p 3 n ) < Pr{\R 3 -r j \> 5 nP 3 n ) + Pr{\AA > 6 nP 3 n ) 

f (lOOr'nSlPia*) 
< 6 cxp 1 



1 + 30- 

by Lemma I3H1 Lemma |4"T31 and the fact that 

lOO- 1 ^ 30 A 3A 

< 



1 + 30^5 100 30 + B ~ l + B 
for A,B>0. □ 

Lemma 3.3. Suppose that 02 = 02(F,t) > 0, ji = 71 r) < 00 and R = 
R(f,r) < 00 for some r < a\(F) = mf{t : F(t) = 1}. Let 

(2) 

A n = {M n k is convex on [0,t]}. 

Then 

(16) P(A c n ) < 12k n exp (-K[3l(F, r)np 5 n ) 

where K' 1 = 8 2 • 144 2 • 16 • 200 = 4, 246, 732, 800 < 4.3 • 10 9 . 
Proof. Since 

fc„-i 

K = U {B j > B j+1 y, 

it follows that 

fe„-i 



fe„-i 

= ^ > B j+ i, \T t -n\< 36 n , jP 3 n , i=j,j + l 

i=i 

p(b 3 > B j+U \T t -n\> :'>d„., pi for i = j or i = j + 1 



3=1 

m n — 1 



< ^ > |Ti - r<| < 36 n , jP 3 n , i = j,j + 1 

fcn-l f ~| 



3=0 



(17) 
where we take 



_ C(F,r) _ C(F,r) 



k n f( a *) f{ a *) my 
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here a* E [oj-i, fflj] satisfies Aja — aj — ejj-i = l/(fc n /(a*)), and C(F,t) is a 
constant to be determined. We first bound II n from above. By Lemma 13.21 we 
know that 

P{\ Tj - r,| > 3«W>„j < 6ex P J 

where 5 2 nj f 2 {a*)pl = C 2 {F,r)pl and 

1 1 1 

> - 



l + 30-V<S nj /(a*) l + SO-iC^r)^ 2 
when fc„ > [30- 1 C(F,t)] 1 /2. Hence, 

(18) P(JT, - r,-| > 3S nd rf^ < 6cxp (^OO^C^F, t>j£) . 

We also have 

/ „\ / lOO" 1 ^ /2( a * ) p 3\ 

where € [oj,Oj+i] and Oj+i — aj = Aj + \a — l/(fc n /(a* +1 ). By Lemma 5.1 we 
have f (a j)/f (a < 2iffc„ > 571 (F, t)R. But this implies that f(a*)/f{a* +1 ) < 4 
since 

f(a*) _f(a*) f{a 3 ) f(a j+1 ) 



/K*+i) /(%) 

< /(Oj-l) /(Oj) /(gj + l) < o o x = 4 



Hence, we can write 



<m = hf^'-M - k c2{F '< - 

and, since f(a* +1 )/f{a*) < 1, 

1 1 



1 + 30- VAj/K+i) 1 + 30- 1 C(F, r)i^/(o; +1 )//(o;) 

1 1 
" 1 + 30- 1 C(F,t) p 2 > 2 

when k n > {30~ 1 C(F,t)} 1 / 2 . Thus, we conclude that 

(19) P\\T j+ i - r j+1 \ > 35 n y n \ < 6exp ( —C 2 (F,r)npl 

Combining ([T5jl and (flH|) , we get 

JJ„ < 12fc„cxp f —C 2 (F,T)npl 
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Now we need to handle /„. Recall that 



T- 

B, = 12—^ 



(A,a) 3 ' 



Bj+i — 12 



T 



3+1 



Thus, the event 



fa > B j+1 , \Ti -n\< 35 n , jP 3 n , i = j,j + l} 



is equal to the event 
T, 



X (Aja) 



T 



> 



i) 3 (A i+1 o)3 
Then, it follows that 



, \Ti - n\ < 35 nd pl, i = j,j + l|. 



< 



and 



and hence 



(Aja)* - (A,a)3 (A,a) 3 

T J + 1 . »3+l 36 nJ p 3 



> 



r j+ i 



(A j+1 af ~ {A j+1 a) 3 (A J+1 a) 3 ' 



T, 



(A 3 -o)3 



< 



r 3+l 



(A,a)3 (A J+1 a) 3 



(Aj+ia) 3 (A j+1 a) 3 



r j+i 



< 



(A^) 3 (A, +1 a) 3 



r j+ i 



T 



3+1 



(A,a) 3 (A J+1 a) 3 
3^,iP 3 , 3(5 nj p: 



(Aj+ia) 3 



(A^) 3 (A, +1 a) 3 

The first term in the right side of the previous inequality is the leading term 

(2) 

the sense that it determines the sign of the difference of the slope of W n ' k . 
Lemma 14.51 we can write 



r j+i 



(A,a) 3 (A j+ ia) 



< -±f»(a?)A ja + ±(f"A ja - f; +1 A j+1 a). 



Let a* € [aj-\,aj] such that Aja — p n [f(a*)] 1 . Then, wc can write 



(Aj-a) 3 (A J+ ia) 3 12 



f"(a?)A ja + ^(f"A 3 a f! +1 A j+1 a) 
1 



< 6<5 nj / 3 ( a *) - -f"(a**)A j a+-(f j A j a-f; +1 A j+1 a 



6/ 2 (a*) <$„ 



= 6/ 2 (a*) S n 



6/ 2 (a*) ^ <5 n 



12 

1 />**) 
72 / 3 (a*) 3 

l_/>f) 
72 / 3 (a*) 

l_/>f) 
72 p(a*) 1 



UAp{a*) 



{f"A ja -r; A j+1 a) 



Pn 



144/ 3 (a*) 



7 - f " 



A j+ ia 



1 / 



■i+i 



f 



144 / 3 (a*) I / 



•3 + 1 



■i+i Aja 

A 3 +ia \ 
A, a i 
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6/ 2 («*) { 5 n 



1 f"(a**) f 3 (a*j*), 



72/3(a**) / 3 (a*r" 

1 4+1 ( Jj_ _ A,+io \ 
144 /3(a*) l v /; +i A 3 -a j 



Pn 



< 6/ 2 (a*) ^ 5 n 



1 

72 8 P " 

72 8 Pn 
1 /" / 7" 



7" 



Aj + ia 

144 /3(aJ) A J0 y 



144 /3(a*) \ V /" 1 



1 + 1 



A J+ ia 
A, a _ 



where (using arguments similar to those of Lemma 14.21 and taking the bound on 
\fj — f" +1 \ to be e||/ || which is possible by uniform continuity of /" on [0, r]) 



i-j+l 



- 1 < 



i-j+l 



- 1 



< 



ef(r) 3 l2 (F,r) 
f" 



if k n > max(57i(F, t)R, (yl + \)R/rf) for a given rj > and 

Aj+ia 



Hence 



1 _ Aj + lQ < 

Aj-a 



Aj-a 



1 



< 87i(F,r)p„ 



Ta^ + (a~^~ I2 7 (a -> ) A ^ + ^(/i A ^-4 + A +1 ^ 



<6/ 2 (a*)U 



1 Wt) 
72 8 



+ I ^72(F,r) 7l (^r)p2 



where we can choose e and p„ small enough so that 



for example 



16 72 (F,r)' /3 2 (F,r)- 
The above choice yields 

s»^«){*.-^«.}-o 
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by choosing 



a no? \ MM 
5 n = C(F, t)p„ = - ■ 144 Pn 



i.e. C(F,t) = /3 2 (F,r)/(8 • 144). For such a choice, the first term I n in ((T7J) is 
identically equal to 0. □ 



4. Appendix 1: technical lemmas 
Lemma 4.1. Under the hypotheses of Theorem 

1 < /(Qj-i) < A j+ia < 2 

uniformly in j if k > 5j±R. 

Proof. Note that for each interval Ij — [<Zj— i, dj] we have 

' 7 /(^=/k)^{|^x 



where a* S Ij. Thus 



and 



It follows that 



Pn j p / n ^ Pr, 

< f(aj) < 



Aj+ia Aja 



Pn ^ rt \ P n 

< f(aj-i) < 



Aja Aj_ia 



— f{o.j) ~ Aj_ia A^a Aj_ia 
Thus we will establish a bound for Aj+ia/Ajd. Note that with c = F(t) < 1 



A,- +1 a = a j+1 - a, = ^(^-c) - ^(fc) 



c 1 c 2 
c 1 



C -/'(fr+l) /(«,) ] 

fc/(a,)\ 2fc /fe+i)J 
elf C7i „ 

< T "77 7 S 1 + 777"^ 



fc/(a 3 -) 1 2k 

for some £ 3 +i e ij+i, where £ 3 _|_i € -fj+i, -R < oo, and 71 < 00. 
Similarly, expanding to second order (about a 3 again!), 



Aja = a - dj-x = F^ij-c) - F^^—^-c) 



c 1 . c 2 mj 

kf( aj ) + 2P /3fe) 
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-kf(a 3 )\ + 2fc/^)J 

since f( aj )/f(Q < 1 and /'(&•) < 
> cl( C7i 



where £j G Jj. Thus it follows that for fc = k n so large that ji/(2k) < 1/2 we have 
A J+ltt< 1 + 

<1 + 7iCR + i) 

if fc = fc„ > 7i. The last inequality here follows from 

|(E/2 + l) + I/?<|(iJ+«) 



if and only if 



(R/2 + 1) + ^-i? < R + a 
2k 



or, equivalently, if and only if 



%R<R/2 + a-l, or k > % R + ^ _ = ft 



if a = 1. It now follows that 



x < /( n j-i) < A j+i« = A J+1 q Aj-a < 2 
_ f( a j) ~ Aj_ia AjO Aj_iaAj_ia — 



if 

Aj+ia 



A, 



< \/2 



for i = j — But these inequalities hold if k is so large that 1 + 71 - k + ' < \/2, 
or fc > bjiR > 7i(i?+l)/(V2- 1) since i? > 1 and 1/(^2-1) < 5/2. □ 

Lemma 4.2. Under the hypotheses of Theorem \ 2. 21 

= m 

where the o(l) depends only on t, jx(F,r), and 72 t). 
Remark. Note that 

(20) max fc l^-^l < ^l«| 4 r (4) ll = ^|a| 4 ||/"|| < ^(^K- 
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This follows since 

r s - t s = \ [y « (aj-i) +F« (aj) 

((C[y])«(a J -_ 1 )-(C[y])«(a i )))A,-a 
= i{(yW(a,_ 1 )-(C[y])«(a,_ 1 )) 

+ (yW(a J -)-(C[y])«(a J -))}A i a, 
and hence from de Boor [5j, (20), page 56, it follows that 

and this yields ([20]) . The claim of Lemma l4~2l is stronger because it makes a state- 
ment about the differences tj — rj relative to (A^a) 4 ; this is possible because only 
differences between the derivative of the derivative of Y and the derivative of its 
interpolant C[Y] at the knots are involved. 

Proof. We have 

(21) r 3 - ti = i(4 1} (oi-i) +4 1) («i))A J a, 

where £ g = g — C[g]. Now, using the result of Problem 2a, Chapter V of de Boor 
(compare also with the formula (3.52) given in Nurnberger [2fj|). we have 

sA'H'ii o + 24 1} K) + (i - <^)4Wi) = ^ 

for j = 0, • • • , fc — 1, where 



and 



ft 



5 J = X 

dj+i — Aja + Aj+ia 

^•(-A.a) 3 /"^) + (1 - *i)(A J+1 o) 3 /"(6j) 



24 

£2 j S [oj_i, Oj+i]- By Problem IV 7(a) in de Boor 0] and the techniques used 
in Chapter III (see in particular equation (9)), a bound on the maximal value at 
the knots of the derivative interpolation error can be derived using the following 
inequality 

W/n.^l.'^flpWfOI 1/3.1 \p( l h 



(22) max \£?> (a 3 ) | < max \£$> (a )\ max | ft | , |£ ^ (a k )\ 

0<j<k \ 1 

By definition of the complete cubic spline, £y \ a o) = 4^( a *0 = 0- Thus, we will 
focus now on getting a sharp bound for maxi<j<fc_i under our hypotheses. 
This will be achieved as follows: 
• Expanding 8j around 1/2: We have 

x _ a j+i~ a j _ K if( a *j+i)}~ 



(a j+1 - aj) + (Oj - Oj'-i) k^ifia*)}- 1 + fc„ 1 [/(a*)] 
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where a* E [aj-i,a,j] and a** <E [oj,Oj+i], and hence 
/(a* +1 ) - /(a*) 



2(/«) + f(a* +1 )) 



/'«*) 



2(/(a*) + f(a* +1 )) 
/'«*) 



(a, -a,) 



2(/(a*) + /(a* +1 )) a, - a,_ 



M j A,a 



where 



/'(or) 



< 



< 



< 



2(/(a*) + /(a* +1 )) a, - 0j -_i 

l/'( a j-i)l Qj+i - aj-i 
4/(aj+i) 0,-0,-1 
l/'K-i)| /(aj-i) / a j+ i -Oj 



4/(aj-i) /(a i+ i) Vaj - Oj-i 

l/'(°j-i)l 
4/(%-i) 



= (V2 + 1) 



2-2-(V2+l), for fc n > 57ii? 
l/'fo-i)! 



/( a i-i) 

• Approximation of and f"{£,2,j)'- Define eij and C2j by 

e U = - /"( a i-i), and e 2j = /"(6,j) - /"(%•)• 

By uniform continuity of J^ 2 - 1 = /" on the compact set [0,t], for every e > 
there exists an -q — r] e > such that \x — y\ < 77 implies \ f"{x) — f"{y)\ < e. 
Fix e > (to be chosen later). We have £i j7 ^2,j G [ a j-i> a j+i]> where, by 
the proof of Lemma [HTJ if fc„ > 5jiR, 



+ 1 — (2j — 1 = Qj' + l — Ctj + — C(j — 1 < 



< 



knf(a*) 
(V2 + l)R 



(V2 + 1) 



Thus, if we choose k n such that fc„ > max (b'yi R, (V2 + l)/r]R?J , then 



a j+ i 



dj-i < 17 for all j = 1, . 



, A; and furthermore 



max {m&j) - /" ( 0j -_i)|, |/"(£ 2)j ) - /"(aj-i)|} < £) for j = 1, ... , fc, 

or, equivalently, max{|eij|, |e2,j|} < e , j = 1, ■ ■ ■ , k. 
Expanding A J+ ia around A., a: We have 

Aj + ia = dj+i — cij = cij — ctj— 1 + — o,j — {flj — aj-i)] 

a j+ i - aj 



Ajd + Aja 



a. 7 - a.7-1 



1 I = Ajd + Ajd 63, 
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where 



e 3, 



Oj+1 - Qj 

-/'(or) 



/K) 



/(<+i) 



(a i+1 -aA 



f(a*)~f(a* +1 ) 

/Kn) 



Thus, 



k3,il < 



l/'(%-l)l 



' ! '' + 1 - a 3-l) 
( 1 



f(a j+1 ) 

_ l/'fa-i)! 

f(a j+ i) ^fc„/(a*) ^ k n f(a* j+1 ) ^ 

< 2 LT(a i -i)j J_ =2 |/'(a J -_ 1 )| f /(«i- , . 



/ 2 («i+i) fc « P{aj-\) \f(a j+1 )J k n 

< 2 . 2 4 l/'fcj-i)! 1 =3 J/ / K-i)l 1 < 32 ;fr J_. 
/ 2 (a J _i) fc n f 2 (a,j-i) k n k n 

Above, we have used the fact that k n > 5j\R to be able to use the inequality 
f{a 3 ^)/f(a J+1 ) < 2 2 . 

Now, expansion of [3j yields, after straightforward algebra, 



24/3, 



2M j /"(a i _ 1 )(A j a) 



MjAja (3 + 3e 3J + e§ ,)(/" (aj-i) + e 2,i) e 3J 



= T l ,3 + T 2, 



T-: 



3,3 



where 



^ -2| Mj m. J -.)(A 1 .) < 2(V5 + l,JM rta . l) _^_, 
< 4(V2 + 1)7!/; -J- < 4(V2 + l) 7l72 /(r) 3 -J- 



<2- 1 (\/2 + l)7i72r 



-3 



n » 



since /(r) < (2r) 1 by (3.1), page 1669, Groeneboom, Jongbloed and Wellner 



f 2,jl 



(A,a)3 



< 2 



1 | 2(y/2+l)7i 

2 fc n 



e < 2 



1 2(^+1)' 



5i? 



M 2 e, 
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and 



(A,a)3 



< 



< 



1 , 2U 2 - li-, 



£ 5 



k n 




2(V2 + 




5R 




2(72 + 




5i? / 



9671 32 2 7 2 



(7? + 4 



96 32 2 \ 
5i? + 25i?2j 



272/(r) 



1 



n 96 32 2 \ „ 2 , 1 ,1 



if we choose e < 72/(i") 3 = sup 0<t<T /"(i) and again use /(r) < (2r) 1 . Note 
that by 



< 



(A,a) 2 



Thus, using (|22|) and combining the results obtained above, we can write for 
j = 1, . . . ,k, 



(Aj-a) 3 ~ i<"<k-i (Aj-a 



< max 



< 24" 1 max 



|T M | + \T 2 ,i\ + |T 3)i | \a[ 



< 



1 



l«<fc-l 
3 



(A,a) 2 



(23) 



(Mi + M 3 ) — + M 2 e 
(Mi + M 3 ) i + M 2 e 



(A j0 ) 



Ao-a 



(A,a)3 J 



But note that 

w 3 



A, 



(A^a) 3 i<i<k V A,- a 



max 



{<i<k v/K) 



< max 



< 



/(Qj-i) 
/(t) 



where 



/(Qj-i) = /(Qj-i) = . /(Qj) /(a/c-i) 

/(T) /(flfc) /(%) /(«j + l) /(Ofc) 

and, for £ = 0, . . . , k — 1, 

/( a = 1 , f(ai) - f(ai+i) 



/(fflj+l) 



= 1 
< 1 
= 1 



f(ai+x) 
-/'« 



= l+ 1 f^(a l+1 -a l ), 
f{ai+i) 



€ [a/,a /+ i] 



, a, S [a i; a ;+ i] 



/(0 I+ l)/(0f*) fcn 
-f(^) 1 

/(oz+i)/(a?*) fc„ 

-/>/) / 2 («0 J_ 
/ 2 (a ; ) /(o, +1 )/(of*) fe„ 



4: /Crj, 
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Hence, 



| |8 / ^ n 3(k n +2-j) ^ / ^ lN 3( fen+2 ) 



(Aja) 3 ~V 4fc„/ ~\ 4 k, 

< 24 » s2 ( i+2 f jtr^" 

if fen > 7i/(4(2 1/6 -l)) where we used log(l + x) < x for a; > in the last inequality. 
Combining (f2"5)) with (pM)) , it follows that if we choose 



k„ > max 



{571 R, 7i/(4(2 1/6 - 1)), (V2 + l)/ryii} 



then 



or 



(A i0 )3 " 4e 



(Mi + M 3 )i + M 2 e Aja = o(A i a) 



I^-^I = (1) 

where o(l) is uniform in j. □ 
Lemma 4.3. Under the hypotheses of Theorem \2.2[ 

( (100)- 1 n<e/ a (aJ)p2\ 
P,- (IT, - «, - (*, - rj )| > fcrf) < 4e*p [-y^ j • 

Proof. Write 

(¥»-y)( 1 )(a J _ 1 ) + (Y»-y)W(a J ) 
2 

(C[Y„-y])( 1 )(a J _ 1 )+ (C[Y n -y])«( 0i ) 



Aj-a 



2 

where 

fW(t)^( 5 -C[. 9 ]) (1) W. 
But for 5 6 C 1 ^-!, aj] with of bounded variation, 

g(t)=g(a j -i)+g'(a j - 1 )(t-a j - 1 )+ f (t - u)dg^(u) 

= Pj(t)+ [ ' g u (t)dg (1 \u) 
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where g u {t) = (t—u)+ = (t— ■u)lr t > u i. Since C is linear and preserves linear functions 
C[g](t) = Pj(t)+ r Cg u (t)dg^(u), 

Ja j _ 1 

and this yields 



£»(*)=/ £ gu {t)dg (1 \u) 



9u 
o,-_i 



and 



4 1} (*) = / 4?(*)<fc (1) («). 

Applying this second formula to g = Y n — Y yields the relation 
4?-r(*) = P ^(*)d(F„-F)(u). 



9u 

CSi-1 



Now <?„ is absolutely continuous with g u (t) = J Q * (s)ds where <?«(£) = l[t>„], so 
by de Boor @, (17) on page 56 (recalling that our C = I4 of de Boor), 

11^11 = ll^-CW^II 

< (19/4)dist( 9 «,$ 3 ) < (19/4)dist( 9 W,$ 2 ) 
<(19/4V( ff i 1 ),|a|)<(19/4)<5. 

Thus the functions (u, t) 1— > £^ (t)Aja are bounded by a constant multiple of A 
while the functions h jt i(u) = (ai)l[ aj _ uaj ] (u)A 3 a, I G {j - satisfy 

Vor[^-,j(^)] < ( a jo) 2 I 3 (l$/4:) 2 f(u)du < 5 2 (A ja ) 3 /(%-i) 



.jU, 



for fc > 5ji(F,t)R as in the proof of Lemma [3.11 in section 3. By applying Bern- 
stein's inequality much as in the proof of Lemma |3. II we find that 

Pr (l4^_ y (o,)l > 6nP 3 n ) 



< 2 exp 
= 2 exp 
= 2 exp 



50p 3 Jf(a*r+ Pn (5/3)S n pl/f(a*) 

nSlP(a*)pl \ 
100+(10/3)p„/(a*)<5„ J 



{lQQ)-^n5lp{a*)pl ' 
1 + (l/3Q) Pn S n f(a*) 



Thus it follows that 

Pr(\W j \>5 nP 3 n ) 
< Pr (jS^yia^)] > S n p 3 n ) 

+ Pr(\£^_ Y (a j )\>6 nP 3 n ) 



< 4 exp 



1 + (1/30)^/(05 
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This completes the proof of the claimed bound. □ 
Lemma 4.4. Let R(s,t) be defined by 
R(s,t) = Ph Stt 



^(F(t) + F{s))(t- s) - j Fun, hi. n .s • / x .. 



Then 

(25) R(s,t) 



< &rm -sf + ± su Ps<x<t f"( X )(t s y 

> £/'(*)(« - s) 3 + i ini s < x < t f"(x)(t - ,s) 4 . 



Remark. It follows from the Hadamard-Hermite inequality that for F concave, 
R(s,t) < for all s < t; see e.g. Niculescu and Persson [19J, pages 50 and 62-63 
for an exposition and many interesting extensions and generalizations. Lemmas A4 
and A5 give additional information under the added hypotheses that F^ exists 
and is convex. 

Proof. Since g s (t) = R(s,t) has first three derivatives given by 

gP(t) = j t R s {t) = l -f{t)(t -s) + \{F{t) + F(s) F(t)) 
= lf(t)(t- S )-±(F(t)-F(s)) t = : s 0, 

g?\t) = ^Rs{t) = \rm s) + \{f{t) f{t)) o, 

gf\t) = ^R s (t) = \f"{t){t - S ) + i/'(t), 

we can write R(s, t) as a Taylor expansion with integral form of the remainder: for 

s < t, 

R(s, t) = g s (t) = g s (s) + g' s (s)(t - s) + |<#(s)(f - s) 2 
1 '* 



+ ji I gf ] {x){t-x) 2 dx 



+ | f (lf"(x)(x s) + \f'{x)) (t xf 



dx 



If* 1 rt 



f'(x)(t-x) 2 dx + - j f"(x)(x~s)(t-x) 2 dx 



t {f'( S ) + f"(x*)(x-s)}(t-x) 2 dx 



1 '* 



+ j I f"(x)(x- s){t-x) 2 dx 

^ J S 

= ^fm *) 3 + \ fin*?) + /"(*)}(* - »)(* - 

where |a;* — x\ < \x — s\ for each x £ [s,t]. Since J (x — s)(t — x) 2 dx = (t — s) 4 /12 
we find that the inequalities ([25| hold. □ 
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Lemma 4.5. Let r, hi = P(h at _ uai ) = R(ai-i,Oi), i = j,j + 1, / = M te [ aj _ l!a .] 
f"(t) and fj = sup te j a . _ 1 .j f"{t) . Then there exists a* S [aj_x,fltj] = Ij such that 

r n,j r n,j + l , 1 t ui »\ a ,1 /"f" a a \ 

Proof. In view of (|25|) . we have 

< (a^XA.a) 3 + £ sup^ /"(x)(A ja ) 4 
> i/'Ca^OCA.a) 3 + £ inf B6/j /"(x)(A ja ) 4 , 

< (a^+ia) 3 + i S u Pxe/f+1 /"(x)(A J+1 a) 4 



12J WA"jti»; i 24 u "^xei J+ i 

> if (a J -)(A J - +1 a) 3 + iinf xeJ . +1 /"(aO(A i+1 a) 



and hence 



(A,a) 3 (A i+1 a) 3 

< y7;/'( a j-i) + 24 ^ f"( x ) A i a ~ i2^'( a i) ~ 24 J/ f +1 •^"( a; ) A J+ ia 

= ~ 12 f" ^ Aj a + ~h Aj a ~ -j+ 1 Ai+ 1 ^ ' where a * j eIj ' D 

5. Appendix 2: A "modernized" proof of Kiefer and Wolfowitz |l4j ] 

Define the following interpolated versions of F and F„. For k > 1, let aj = a!j , = 
F~ l (jjk) for j = 1, — 1, and set a = ao(.F) and cifc = ai(F). Using the 

notation of de Boor [5J, Chapter III, let L( fc ) = I 2 F be the piecewise linear and 
continuous function on R satisfying 

L(%f )=F(af ), j a*. 

Similarly, define L n = L^) = i^nj thus 

lg\x) = F n (cy) + fc{F„(a i+ i) - F„( a ,)}[i (fc )(a:) - F(a,)] 

for a,j < x < aj+i, j — 0, ...,afc. We will eventually let k — k n and then write 
p„ = l/fc„ (so that F(aj + i) - ^(aj) = l/k n = v n ). 

The following basic lemma due to Marshall [17[ plays a key role in the proof. 

Lemma 5.1 (Marshall 17]). Let \1/ be convex on [0, 1], and let $ be a continuous 
real-valued function on [0, 1] . Let 

$(x) = sup{/i(ir) : h is convex and h{z) < $(z) /or aZ/ z G [0, 1]}. 

Then 

sup |$(x) - *(a;)| < sup \®{x) - 

0<S<1 0<x<l 

Proof. Note that for all y G [0, 1], either <&(?-/) = 'I'd/), or y is an interior point of a 
closed interval / over which $ is linear. For such an interval, either sup^j \$(%) — 
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^{x)\ is attained at an endpoint of / (where <& = <&), or it is attained at an interior 
point, where ^ < $. Since $ < 4> on [0, 1], it follows that 

sup \&(x) - *(a:)| < sup |$(x) - ^(x)\. 



Here is a second proof (due to Robertson, Wright and Dykstra [2l|, page 329) 
that does not use continuity of <I>. Let e = ||$ — ^||oo- Then <3> — e is convex, and 
^(x) — e < $>(x) for all x. Thus for all x 

> $(a;) > *(a;) - e, 

and hence 

e > $(x) - > ¥(x) - #(x) > -e 
for all x. This implies the claimed bound. □ 
Main steps: 

A. By Marshall's lemma, for any concave function h, \\F n — h\\ < ||F„ — h\\. 

B. Pp(An) = PplLn 7 ^ is concave on [0, oo)} /" 1 as n. — * oo if k n = (CoPi(F)x 

/logn) 1 / 3 for some absolute constant Co- 

C. On the event A n , it follows from Marshall's lemma (step A) that 

||-F n - F„|| = \\F n - 4 fe ") + - F„|| 

<||F„-L(^)|| + ||L( fe ")-F„|| 
= 2||F„-4 fc ")|| 

= 2||F„ - - (F - + F- L^)\\ 

< 2||F„ - — (F — L (k ^)\\ +2\\F- L<*»>|| 
= 2(D n + E n ). 

D. D n is handled by a standard "oscillation theorem"; E n is handled by an 
analytic (deterministic) argument. 

Proof of fl]) assuming B holds. Using the notation of de Boor [5j, chapter III, we 
have 

¥ n -F- (L„ — L) = ¥ n — F — J 2 (F„ - F). 

But by (18) of de Boor page 36, — I 2 g\\ < u)(g; \a\) where u>(g; \a\) is the oscil- 
lation modulus of g with maximum comparison distance \a\ = max., Aaj (and note 
that de Boor's proof does not involve continuity of g). Thus it follows immediately 
that 

£> n =||F n -F-(L n -Z)|| 
= ||F„-F-/ 2 (F n -F)|| 

<u{¥ n - F;\a\) = n- 1/2 uj{i] n ; Pn ) 

where U„ = \/n(G n — I) is the empirical process of n i.i.d. Uniform(0, 1) random 
variables. From Stute's theorem (see e.g. Shorack and Wellner [22|, Theorem 14.2.1, 
page 542), lim sup w(U ra ; p n ) / \/2p n log(l/p„) = 1 almost surely ifp n — > 0, np n — > oo 
and \og(\ / p n ) / np n — > 0. Thus we conclude that 



F — (L„ - L)\\ = 0{n- x ' VP«l°g(lM0) = 0{{n- 1 logn) 2 / 3 ) 
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almost surely as claimed. 

To handle E n , we use the bound given by de Boor [f|, page 31, (2): \\g — I 2 g\\ < 
8 _1 |a| 2 ||£7"||. Applying this to g = F, I 2 g = i (fc) yields 

\\F-LW\\ = \\F-I 2 F\\ < l\a\ 2 \\F"\\ 



<\li[F)pl = 0((n- x log n) 2 / 3 ). 



Combining the results for D n and E n yields the stated conclusion. □ 

It remains to show that B holds. To do this we use the following lemma. 
Lemma 5.2. If p n — > and 5 n — > 0, then for the uniform(0, 1) d.f. F = I , 

P(\G n (p n ) - Pn \ > 6 nPn ) < 2exp(--np„5 2 (l + o(l))) 
where the o(l) term depends only on 5 n . 

Proof. From Shorack and Wellner [22| , Lemma 10.3.2, page 415, 

P(G n ( Pn )/p n >\)<p( sup %^ > a) < exp{-n Pn h(\)) 
\p„<t<i t J 

where h(x) — a; (log x — 1) + 1. Hence 

'(Gn(Pn)-Pn ^ X A ^ , . ,nn 
> A < exp(— np n h{l + A)) 

Pn J 

where h(l + A) - A 2 /2 as A | 0, by Shorack and Wellner [H, (11.1.7), page 44. 
Similarly, using Shorack and Wellner [22], (10.3.6) on page 416, 

where h(l — A) ~ A 2 /2 as A \ 0. Thus the conclusion follows with o(l) depending 
only on 5 n . □ 

Here is the lemma which is used to prove B. 

Lemma 5.3. If fti(F) > and 71 (F) < 00, then for k n large, 

1 - P(A n ) < 2fc„cxp(-7i/3 2 (F)/80£; 3 ). 

Proof. For 1 < j < fc n , write 

T„j = F„(aj) - F n (aj_i), Aj-a = a 3 - a^-i- 

(k ) 

By linearity of L n " on the sub-intervals [a,j—i, %'], 



Suppose that 



3=1 



A j+1 a 



(26) \T n i — l/k n \ < S n /k n , i = j,j + l; and -p_ > 1 + 3* 



Aj-a 



A Kiefer-Wolfowitz theorem 



29 



Then 

rp > 1 S n _ l-S n 1 + Sn 

n ' J - h h ~ k - u 

and it follows that for 6 n < 1/3 

[1 + 35 > (1 + 5)/(l - <5) iff (1 + 26 - 36 2 ) > 1 + 6 iff 6 - 3<5 2 > iff 1 - 36 > 0.] 
Now the A part of (gSJ) holds for 1 < j < fc„ - 1 provided S„ < (3 1 {F)/6k n < 1/3. 
Proof: Since 



we can write 



A j+1 a = F {—r—)-F (-) = fc n 



for some Oj- < £ < a 3 -+i, and 

1 



A,- a < fc r] 1 



/(%)' 

Combining these two inequalities yields 

-/'«) 
/no 



^>i + ( 2M-V(« J )(- 



l+ i(7w) il+ i flin 



= i + 3<y n 

if 5„ = /3 1 (^)/(6fc„). 
Thus we conclude that 

k„ - 1 fen -l 

1-P(A„)=P( |J B^)< P(B^) 

j=l j=l 

fcn-1 

< J] 2P(|T nJ - l/fc„| > $„/fc„) 

< fc„4exp(-2- 1 np„^ l l + o(l))) = 4fc„ exp(-n/3 2 (F)/80fc3). 

by using Lemma [5?2l and for k n sufficiently large (so that (l + o(l))> 72/80). □ 
Putting these results together yields Theorem ll.il 
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